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Q Abstract 

u 

We develop a new sampling strategy that uses the hit-and-run algo- 
^ rithm within level sets of the target density. Our method can be applied 

to any quasi-concave density, which covers a broad class of models. Our 
sampler performs well ia high-dimensional settings, which we illustrate 
with a comparison to Gibbs sampling on a spike-and-slab mixture model. 
We also extend our method to exponentially-tilted quasi-concave densi- 
^ ties, which arise often in Bayesian models consisting of a log-concave like- 

^ lihood and quasi-concave prior density. Within this class of models, our 

^ method is effective at sampling from posterior distributions with high de- 

^ pendence between parameters, which we illustrate with a simple multi- 

variate normal example. We also implement our level-set sampler on a 
Cauchy-normal model where we demonstrate the ability of our level set 
^ sampler to handle multi-modal posterior distributions. 
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1 Introduction 



Complex statistical models are often estimated by sampling random variables 
from complicated distributions. This strategy is especially prevalent within 
the Bayesian framework, where Markov Chain Monte Carlo (MCMC) simula- 
tion is used to estimate the posterior distribution of a set of unknown param- 



eters. The most common MCMC technique is the Gibbs sampler (Geman and 



Geman 1984[ ), where small subsets (often one parameter at a time) are sam- 



pled from their conditional posterior distribution given the current values of 
all other parameters. This component-wise strategy reduces the sampling from 
complicated multi-dimension distributions into sampling from a series of small 
(or one-) dimension distributions. For sampling in low dimensions, various 
techniques such as the Metropolis-Hastings algorithm ( Hastingsj 1970[ ) can be 
employed. 

However, in high dimensional parameter spaces component-wise strategies 
can encounter problems such as high autocorrelation and the inability to move 
between local modes. In high dimensions, one would ideally employ a strat- 
egy that sampled the high dimensional parameter vector directly, instead of one 
component at a time. In this paper, we develop a sampling algorithm that pro- 
vides direct samples from a (potentially high-dimensional) quasi-concave den- 
sity function. 

Consider a high-dimensional density function f(x) from which we would like 
to obtain samples x. A density function / is quasi-concave if: 

Ca = {x\ f{x) > a} 

is convex for all values of a. Our procedure is based on the fact that any hori- 
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zontal slice through a quasi-concave density / will give a convex level set above 
that slice. By slicing the quasi-concave density / at a sequence of different 
heights, we divide the density / into a sequence of convex level sets. 

We then use the hit-and-run algorithm to sample high-dimensional points x 
within each of these convex level sets, while simultaneously estimating the vol- 
ume of each convex level set. As reviewed in Vempala ( |2005 ), recent work 



suggests that the hit-and-run algorithm is an efficient way to sample from 
high-dimension convex sets as long as the start is "warm". The hit-and-run 
algorithm is not as commonplace as other sampling methods (e.g. Metropolis- 



Hastings), though Chen and Schmeiser ( 1996 ) discuss using hit-and-run Monte 



Carlo to evaluate multi-dimensional integrals. 



In Section 2.1 we review several recent results for hit-and-run sampling in con- 



vex sets. In Section 2.2 we outline our procedure for ensuring a warm start 
within each convex slice, thereby giving us an efficient way of sampling from 
the entire quasi-concave density. In Section|3| we present an empirical compar- 
ison that suggests our "level-set hit-and-run sampling" methodology is much 
more efficient than component-wise sampling schemes for high dimensional 
quasi-concave densities. 

We also extend our method to sample efficiently from exponentially-tilted quasi- 



concave densities in Section 2.4 Exponentially-tilted quasi-concave densities are 
very common in Bayesian models: the combination of a quasi-concave prior 
density and a log-concave likelihood leads to an exponentially-tilted quasi- 
concave posterior density. 

In Section |4| we implement our exponentially- tilted level-set sampler on a sim- 
ple multivariate normal density. In high dimensions, our method is more ef- 
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fective at obtaining posterior samples than a component-wise Gibbs sampling 
strategy. 

Finally, in Section |5| we illustrate the efficiency of our method on one such 
model, consisting of a Normal data likelihood and a Cauchy prior density. It 
should be noted that this posterior density can be multi-modal, depending on 
the relative locations of the prior mode and observed data. Our results in Sec- 
tion |5] suggest that our extended method can accommodate posterior distribu- 
tions which are both high-dimensional and multi-modal. 

2 Methodology and Theory 

2.1 Quasi-Concave Densities and Sampling Convex Level Sets 

Let /() be a density in a high dimensional space, then / is quasi-concave if: 

Ca = {x\ fix) > a} 

is convex for all values of a. In other words, the level set Ca of a quasi-concave 
density / is convex for any value a. Let Q denote the set of all quasi-concave 
densities and V denote the set of all concave densities. All concave densities 
are quasi-concave, V c Q, but the converse is not true. Quasi-concave densities 
are a very broad class that contains many common data models, including the 
normal, gamma, student's t, and uniform densities. 

Our proposed methodology is based on computing the volume of a convex 
level set C by obtaining random samples x from C. Let us assume we are 
already at a point x° in Ca. A simple algorithm for obtaining a random sample 
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X would be a ball walk: pick a uniform point x in a ball of size 5 around x", and 
move to X only if x is still in C. 

An alternative algorithm is hit-and-run, where a random direction d is picked 
from current point x^. This direction will intersect the boundary of the convex 
set C at some point x' . A new point x is sampled uniformly along the line seg- 
ment defined by direction d and end points x° and x^, and is thus guaranteed 
to remain in C. 



Lovasz (1999) showed that the hit-and-run algorithm mixes rapidly from a 
warm start in a convex body The warm start criterion is designed to ensure 
that our starting point is not stuck in some isolated comer of the convex body 



Vempala ( 2005[ ) suggests that a random point from convex set C e C provides 



a warm start for convex set C as long as volume(C")/volume(C) > 0.5. Vem 



pala ( |2005[ ) also presents several results that suggest the hit-and-run algorithm 



mixes more rapidly than the ball walk algorithm. 



2.2 LSHRl: Level-Set Hit-and-run Sampler for Quasi-concave 
Densities 

We harness the results of the previous section in our sampling algorithm for an 
arbitrary quasi-concave probability density. Briefly, our algorithm proceeds in 
stages where we sample from increasingly larger level sets of the quasi-concave 
density, while ensuring that each level set provides a warm start for the next 
level set. After the entire density has been covered in this fashion, our samples 
from each level set are weighted appropriately to provide a full sample from 
the quasi-concave probability density. We will see in later sections that our 
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procedure is far more efficient than standard MCMC methods when the desired 
quasi-concave density has high dimension. We outline our algorithm in more 



detail below, and also include pseudo-code in Appendix A 



Our algorithm starts by taking a small level set centered around the maximum 
value ^inax of the quasi-concave density /(■). This initial slice is defined as the 
convex shape Ci of the probability density /(■) above an initial density thresh- 
old ti. In other words, a particular point x is a valid member of slice Ci iff 

fix) > ti. 

Starting from Xmux, we run a hit-and-run sampler within this initial level set for 
m iterations. In our supplementary materials, we explore appropriate choices 
for the number of hit-and-run iterations m. 

Each iteration of the hit-and-run sampler consists of first picking a random 
directiorj^d from current point x. This direction d and current point x define a 
line segment with endpoints at the edge of the convex level set Ci. A new point 
x' is sampled uniformly along this line segment, which ensures that x' remains 
in Ci. We use {x}i to denote the set of all points sampled from Ci using this 
hit-and-run method. 

After sampling m times from convex level set Ci, we need to pick a new thresh- 
old t2 < ti that specifies a new level set of the quasi-concave probability den- 
sity /(■). However, in order for our sampler to stay "warm" as defined in Sec- 
tion 2.1 we need t2 to define a convex shape C2 with volume V2 that is less than 



twice the volume of our initial volume Vi. 

We define the volume ratio of two level sets as Ri:2 = V1/V2, so we need Ri;2 > 



^Note that we can increase the efficiency of the hit-and-run sampler for highly non-spherical 
convex shapes by scaling this random direction by the estimated covariance matrix of our sam- 
ples. See Appendix [A|for details. 
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0.5 for the two level sets Ci and C2. In addition, we want a threshold t2 that is 
not too close to ti so that we are not using an excessive number of level sets. 

We meet these criteria by first proposing a new threshold tprop and then, starting 
from current point x, running m iterations of the hit-and-run sampler within 
the new convex level set Cprop defined by f{x) > tprop- We focus on the ra- 
tio of volumes -Ri;prop = ^i/Vprop which we estimate with i?prop equal to the 
proportion of these sampled points from convex level set Cprop that were also 
contained within the previous level set Ci. 

We only accept the proposed threshold tprop if 0.55 < -Rprop < 0.8. The lower 
bound fulfills the criterion for the new level to be "warm" and the ad hoc upper 
bound fulfills our desire for efficiency: we do not want the new level set to not 
be too close in volume to the previous level set. 

If the proposed threshold is not accepted, we propose a new threshold tprop 
in an adaptive fashion (details in Appendix |A]) and repeat the same steps for 
testing this new threshold. If the proposed threshold is accepted, we re-define 
tprop = h and then the convex level set Cprop = C2 becomes our current level set. 
The collection of sampled points {x}2 from C2 are retained, as is our estimated 
ratio of slice volumes Ri;2- 

In this same way, we continue to add level sets Ck defined by threshold tk based 
on comparison to the previous level sets C^ 1. The level set algorithm termi- 
nates when the accepted threshold is lower than some pre-specified lower 
bound K on the probability density /(■). 

The result of this entire procedure is n level sets, represented by a vector of 
thresholds (ti, . . . a vector of estimated ratios of volumes {Ri;2, • • • , -Rn-i:n)/ 
and the level set collections: a set of m sampled points from each level set 
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{{X}i,...,{x}n). 

In order to obtain L essentially independenl|^ samples from the pseudo-convex 
density, we sub-sample points from the level set collections ({a;}i, . . . , {x}n), 
with each level set i represented proportional to its probability pi. 

The probability of each level set is the product of its density height and volume. 

Pi = (ti-i - U) X Vi 

We have our volume ratios Ri-A+i as estimates of V^/V^+i, which allows us to 
estimate pi by first calculating 

n 

Qi = {U-i -ti)y~\Y ^j-j+i i = l,...,n (1) 

j=i 

where to is the maximum of /(•) and Rn.n+i = 1/ and then pi = Qi/Yl Qj- 

We demonstrate our LSHRl algorithm on a spike-and-slab density in Section |3] 

2.3 Exponentially-Tilted Quasi-concave Densities 

A density g{) is a log-concave density function if \ogg{) is a concave density 
function. Let C denote the set of all log-concave density functions. All log- 
concave density functions are also concave density functions (C C V), and 
thus all log-concave density functions are also quasi-concave density functions 
c Q). 

^Although there is technically some dependence between successive sampled points from 
the hit-and-run sampler, our scheme of sub-sampling randomly from level set collections 
({x}i, . . . , {x}n) essentially removes this dependence 
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Bagnoli and Bergstrom ( 2005[ ) gives an excellent review of log-concave densi- 



ties. The normal density is log-concave whereas the student's t and Cauchy 
density are not. The gamma and beta densities are log-concave only under cer- 
tain parameter settings, e.g. both the uniform and exponential densities are log- 
concave. The beta(a, b) density with other parameter settings (e.g. a < 1, 6 < 1) 
can be neither log-concave nor quasi-concave. 

Now, let T denote the set of exponentially-tilted quasi-concave density func- 
tions. A density function h is an exponentially- tilted quasi-concave density 
function if there exists a quasi-concave density f E Q such that f{x)/h{x) = 
exp(^':r). Exponentially- tilted quasi-concave densities are a generalization of 
quasi-concave densities, so Q C T. 

These three classes of functions (log-concave, quasi-concave, and exponentially- 
tilted quasi-concave) are linked by the following important relationship: if 
X ~ / where f E Q and Y\X ~ g where g E C, then X|F ~ h where h E T. 

The consequences of this relationship is apparent for the Bayesian approach. 
The combination of a quasi-concave prior density for parameters 6 and a log- 
concave likelihood for datay\6 will produce an exponentially-tilted quasi-concave 
posterior density B\y. 

We now extend our level-set hit-and-run sampling methodology to exponentially- 
tilted quasi-concave densities, which makes our procedure applicable to a large 
class of Bayesian models consisting of quasi-concave priors and log-concave 
likelihoods. 

As mentioned above, examples of log-concave likelihoods for data y\6 include 
the normal density, exponential density and uniform density. Quasi-concave 
priors for parameters 6 are an even broader class of densities, including the 
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normal density, the t density, the Cauchy density and the gamma density 

In Section |4| we examine a simple multivariate normal density, which is log- 
concave. Then, in Section |5| we examine an exponentially-tilted quasi-concave 
posterior density resulting from the combination of a multivariate normal den- 
sity with a Cauchy prior distribution. 



2.4 LSHR2: Level-Set Hit-and-Run Sampler for Exponentially- 
Tilted Quasi-concave Densities 



In Section 2.2 we presented our level set hit-and-run algorithm for sampling x 
from quasiconcave density f{x). We now extend our algorithm to sample from 
an exponentially-tilted quasi-concave density h{x). 



As mentioned in Section 2.3 exponentially-tilted quasi-concave densities com- 
monly arise as posterior distributions in Bayesian models. In keeping with the 
usual notation for Bayesian models, we replace our previous variables x with 
parameters 6. These parameters 9 have an exponentially- til ted quasi-concave 
posterior density h{6\y) arising from a log-concave likelihood g{y\9) and quasi- 
concave prior density f{6). 

Our LSHR2 algorithm starts just as before, by taking a small level set centered 
around the maximum value ^max of the quasi-concave prior density /(■). This 
initial level set is defined, in part, as the convex shape of the probability density 
/(•) above an initial density threshold ti. 

However, we now augment our sampling space with an extra variable p where 
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p < g{y\ log^). Letting B* = {6,p), our convex shape is now 

D, = {0* : f{e)>h,p<logg{y\e)} 



Within this new convex shape, we run an exponentially-weighted version of 
the hit-and-run algorithm. Specifically, a random {d + l)-dimensional direction 
d is samplec^ which, along with the current 6*, defines a line segment with 
endpoints at the boundaries of Di. Instead of sampling uniformly along this 
line segment, we sample a new point (6*)' from points on this line segment 
weighted by exp(p). 

The remainder of the LSHR2 algorithm proceeds in the same fashion as our 
LSHRl algorithm: we construct a schedule of decreasing thresholds ti,t2, ■ ■ ■ 
and corresponding level sets Di,D2, . . . such that each level set is a warm 
start for the next level set -Dfe+i. 

Within each of these steps, the exponentially-weighted hit-and-run algorithm 
is used to sample values 6* within the current level set. As before, is a warm 
start for the next level set Dk+i if the ratio of volume^ -Rfc:fc+i = Vk/Vk+i that 
lies within the range 0.55 < Rk-.k+i < 0.8 The algorithm terminates when our 
decreasing thresholds tk achieve a pre-specified lower bound K. 

Our procedure results in n level sets {Di, . . . , represented by a vector of 
thresholds (ti, . . . tn), a vector of estimated ratios of volumes {Ri:2, • • • , Rn-i-.n), 

gain we can increase the efficiency of the hit-and-run sampler for highly non-spherical 
convex shapes by scaling this random direction by the estimated covariance matrix of our sam- 
ples. See Appendix|B]for details. 

^We continue to use the term "volume" even though these convex shapes are a combination 
of d-dimensional 9 and the extra log-density p dimension. 
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and the level set collections: a set of m sampled points from each level set: 



Finally, we obtain L samples (^^, . . . , 6*^) by sub-sampling points from the level 
set collections ({^*}i, . . . , {0*}n), with each level set i represented proportional 
to its probability pi. The probability of each level set is still calculated as in ([l]). 

By simply ignoring the {d+l)-th sampled dimension p, we have samples {6i,. . . ,6l) 
from the exponentially- tilted pseudo-convex posterior density h{6\y). Details 
of our LSHR2 algorithm are given in Appendix |Bj We demonstrate our LSHR2 
algorithm on a multivariate normal density in Section |4] and a Cauchy-normal 
model in Section HI 



2.5 Component-wise Alternatives: Gibbs and Metropolis-Hastings 

Although there have been many recent advances in MCMC methods, we fo- 
cus on our comparisons on the Gibbs sampler as it remains the most popular 
method for obtaining samples from high-dimensional distributions. The Gibbs 



sampler (Geman and Geman[ 1984) is a Markov Chain Monte Carlo approach 



to sampling from a multi-dimensional density f{x). 

The basic Gibbs strategy is to sample a single component xt from its univariate 
conditional distribution /(xj|x_j) given all other components Successively 
iterating this sampling strategy through each component i = 1, . . . ,n will give 
a full sample x. These sampled a;'s will converge in distribution to the target 
distribution f{x). 

The benefit of a Gibbs approach is that sampling from a high-dimensional den- 
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sity f{x) is reduced to sampling from a series of one-dimensional densities 
f{xi\x^i). The Gibbs sampler is most attractive when these one-dimensional 
conditional distributions each have a standard form but even if this is not the 
case, the Metropolis-Hastings algorithm ( [Hastingsj 1970[ ) can be employed for 
sampling from non-standard distributions. 

The Gibbs sampler is not always strictly component-wise: for some models, 
blocks of variables can be sampled in a single step from a standard distribu- 
tion. That said, a Gibbs strategy for the sampling of a high-dimensional x will 
typically involve sampling a series of small subsets of x. 

The component-wise approach of the Gibbs sampler has both strengths and 
weaknesses. The relative simplicity of sampling from conditional distributions 
is a clear strength. However, a potential weakness is that high correlation can 
lead to extremely slow convergence to the target distribution. Our own ap- 
proach is motivated by the belief that it is advantageous to sample the entire 
set of variables simultaneously when possible. 



3 Empirical Study 1: Spike-and-slab Density 

We consider the problem of sampling from a multivariate density which con- 
sists of a 50-50 mixture of two normal distributions, both centered at zero but 
different variances. Specifically, the first component has variance So with off- 
diagonal elements equal to zero and diagonal elements equal to 0.05, whereas 
the second component has variance Eq with off-diagonal elements equal to zero 
and diagonal elements equal to 3. Figure [T] gives this spike-and-slab density in 
a single dimension. The spike-and-slab density is commonly used as a mixture 
model for important versus spurious predictors in variable selection models 
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Figure 1: Spike-and-slab density: gray lines indicate density of the two components, black line 
is the mixture density 




( [George and McCullochj [19971 ) 



This density is quasi-concave and thus amenable to our proposed level-set hit- 
and-run sampling methodology. Sampling from this density in a single dimen- 
sion can be easily accomplished by grid sampling or Gibbs sampling using data 
augmentation. However, we will see that these simpler strategies will not per- 
form adequately in higher dimensions whereas our LSHRl sampling strategy 
continues to show good performance in high dimensions. 

3.1 Gibbs Sampling for Spike and Slab 

The usual Gibbs sampling approach to obtaining a sample x from a mixture 
density is to augment the variable space with an indicator variable / where 
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1 = 1 indicates the current mixture component from which x is drawn. The 
algorithm iterates between 

1. Sample x from mixture component dictated by /, i.e. 

f N(0,Ei) if 
[ N(0,Eo) if 

2. Sample / with probability based on x: P{I 

where (j){x, n, E) is the density of a multivariate normal with mean fi and vari- 
ance E evaluated at x. 

This algorithm mixes well when x has a small number of dimensions {d < 10), 
but in higher dimensions, it is difficult for the algorithm to move between the 
two components. We see this behavior in Figure |2| where we evaluate results 
from running the Gibbs sampler for 100000 iterations on the spike-and-slab 
density with different dimensions {d = 2, 5, 10, 15). Specifically, we plot the 
proportion of the Gibbs samples taken from the first component (the spike) for 
X of different dimensions. 

The mixing of the sampler for lower dimensions {d = 2 and = 5) is reasonable, 
but we can see that for higher dimensions the sampler is extremely sticky and 
does not mix well at all. In the case oi d = 10, the sampler only makes a couple 
movies between the spike component and the slab component during the run. 
In the case oi d = 15, the sampler does not move into the second component at 
all during the entire 100000 sample run. We see this behavior more explicitly in 
Figure |3] where we plot the number of switches between components from the 
Gibbs sampling run for each dimension. Even higher dimensions {d = 20,d = 



I = 1 
1 = 



(/)(a:,0,Si) 
0(a;,O,Ei)+</>(a;,O,Si) 
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25) were similar to the d— 15 case. 



Figure 2: Empirical Mixing Proportion after each iteration is calculated as the proportion of 
samples up to that iteration that are from the first component (the spike). Black lines indicate 
the empirical mixing proportion for the Gibbs sampler, gray lines indicate the true mixing 
proportion of 0.5 
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Figure 3: Number of component switches is the number of titnes during each Gibbs sampler 
run (out of 100000 iterations) that the algorithm moved from one mixture component to the 
other. 



8 

Dimension 



— 

10 



12 



— 

14 



16 



In high dimensions, we can estimate how often the Gibbs sampler will switch 
from one domain to the other. When the sampler is in one of the components, 
the variable x will mostly have a norm of ||x||2 ~ da'^ where a = ctq = .05 for 
component zero and a = ai = 3.0 for component one. For a variable currently 
classified into the zero component, the probability of a switch is 

p (I =i\Mi^ da!) ^(^^y 

where the approximation holds if (Jq ^ o"i • Details of this calculation are given 
in Appendix|C} This result suggests that the expected number of iterations until 
a switch is about (cri/aoy/eY = 36. 4^^. This approximate switching time will be 
more accurate for large d. 



3.2 Level-set Hit-and-run Sampling for Spike and Slab 

Our LSHRl sampling methodology was implemented on the same spike-and- 
slab distribution. Our sampling algorithm does not depend on a data aug- 
mentation scheme that moves between the two components. Rather, we start 
from the posterior mode and move outwards in convex level sets, while run- 
ning a hit-and-run algorithm within each level set. As outlined in Section 2.4[ 



each convex level set is defined by a threshold on the density that is slowly 
decreased in order to assure that each level set has a warm start. 

Figure |4] shows the number of thresholds needed to fully explore the spike- 
and-slab density. Even for high dimensions {d = 10 and d = 20), we see that the 
spike-and-slab density is explored with a reasonable number of thresholds, and 
the linear trend suggests that our algorithm would scale well to even higher 
dimensions. 
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Figure 4: Number of thresholds needed for our level-set hit-and-run sampler on the spike-and- 
slab density as function of dimension. 




The most striking comparison of the superior performance of our LSHRl sam- 
pling strategy comes from examining the sampled values in any particular di- 
mension relative to the true quantiles of the spike-and-slab mixture density In 
Figure |5| we plot the true quantiles against the quantiles of the first dimension 
of sampled values from both the Gibbs sampler and the warm sampler. 

We see that for low dimensions {d = 2), samples from both the Gibbs sampler 
and warm sampler match the correct spike-and-slab distribution. However, for 
higher dimensions {d = 20), the Gibbs sampler provides a grossly inaccurate 
picture of the true distribution, due to the fact that the Markov chain never es- 
capes the spike component of the distribution. In contrast, the samples from 
our level-set hit-and-run sampler still provide a good match to the true distri- 
bution in high dimensions. We have checked even higher dimensions {d = 30) 
and the level-set hit-and-run sampler still provides a good match to the true 
distribution. 
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Figure 5: Comparing level-set hit-and-run sampler and Gibbs sampler in terms of quantile- 
quantile plot of estimated vs. true spike-and-slab distribution. Top row is the low dimension 
{d = 2) case, bottom row is the high dimension {d = 20) case. 





4 Empirical Study 2: Multivariate Normal 

In our second empirical study, we compare our LSHR2 algorithm to the Gibbs 
sampler for obtaining samples from the multivariate normal density, 

e ~ Normal(0,E) (2) 

This is a superficial setting in the sense that it is easy to obtain samples 6 directly 
from a multivariate normal density However, this simple example is still illus- 
trative of differences in performance between our LSHR2 level-set sampler and 
a component-wise Gibbs sampler. 

We considered several choices of dimension d (d = 2, 10, 20, 30) and also sev- 
eral specifications of the data variance E. Specifically, we assume all diagonal 
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elements of E are one, but we consider several different values for off-diagonal 
elements p (p = 0, 0.5, 0.9, 0.99). 

We use our LSHR2 algorithm to obtain samples 6 by re-formulating (j2]) as the 
exponentially-tilted quasi-concave posterior density that results from a log- 
concave multivariate normal likelihood oi y = 0|^,E and a bounded uniform 
prior density on 6 (i.e. 9k ~ Unif(— 6, 6) for k = 1, . . . , d). In this simple situa- 
tion where our quasi-concave prior density is uniform, we only have to use one 
level set in our LSHR2 algorithm. 

We also use a component-wise Gibbs sampler to obtain samples of d by iter- 
atively sampling from the conditional normal densities of 6k\6(-k)- Both the 
Gibbs sampler and LSHR2 sampler were run for one million iterations. In the 
case of the Gibbs sampler, the first 50000 iterations were discarded as bum-in. 

In Figure |6| we examine the performance of our LSHR2 algorithm versus the 
Gibbs sampler when d = 2 and both dimensions are independent (i.e. p = 0). 
Specifically, we give a histogram of the samples for the first dimension 9i of 6 
and the empirical autocorrelation function of the 9i samples for both Gibbs and 
LSHRl algorithms. 

We see in Figure [6]that the samples of 9i from both algorithms are a close match 
to the true density. We also see that the Gibbs sampler has less autocorrelation 
than our level-set procedure in this setting where the true correlation between 
the dimensions of 6 is zero. Compare this observation with Figure |7| where we 
examine the performance of our LSHR2 algorithm versus the Gibbs sampler 
when d = 2 but the correlation between the dimensions is high, i.e. p = 0.99. 

When the dimensions of 6 are highly correlated, we see in Figure [6] that the 
performance of the Gibbs sampler is severely degraded, as evidenced by very 
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high autocorrelation of sampled values. In contrast, the autocorrelation of our 
LSHR2 algorithm is essentially unchanged between Figures [6] and |7| which in- 
dicates our methodology is robust to high dependence among the parameters 

e. 



These same results were seen to an even greater extent when we compared the 
performance of the Gibbs sampler and LSHR2 algorithms in higher dimensions 
{d = 10, 20, 30). Even in this artificial setting of a multivariate normal density 
with high correlation, we see that our level set methodology offers advantages 
over a component-wise Gibbs sampling strategy. In the next section, we exam- 
ine a more complicated model where our LSHR2 algorithm also out-performs 
Gibbs sampling. 

Figure 6: Comparison of samples of from multivariate normal density with d — 2 and p = 0. 
Top row is samples from Gibbs sampler. Bottow row is samples from LSHR2 algorithm. Left 
column is the histograms of the sampled values of 6i (red line indicates true density). Right 
column is the empirical autocorrelation functions of the sampled values of of di . 
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Figure 7: Comparison of samples of 6 from multivariate normal density with d = 2 and p = 
0.99. Top row is samples from Gibbs sampler. Bottow row is samples from LSHR2 algorithm. 
Left column is the histograms of the sampled values of Oi (red line indicates true density). 
Right column is the empirical autocorrelation functions of the sampled values of of 9i . 
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5 Empirical Study 3: Cauchy-normal model 

In our third empirical comparison, we compare our LSHR2 algorithm to the 
Gibbs sampler on a Bayesian model consisting of a multivariate normal likeli- 
hood and a Cauchy prior density. 



y\B ~ Normal(e, CT^I) 
6 ~ Cauchy(0,I) 



(3) 



where y and 6 are d dimensions and cr^ is fixed and known. As mentioned 



in Section 2.3 this combination of a log-concave density g{y\6) and a quasi- 
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concave density for f{9) gives an exponentially-tilted quasi-concave posterior 
density h{6\y). 

For some combinations of and observed y values, the posterior density h{6\y) 
is multi-modal. Figure |8] gives examples of this multi-modality in one and two 
dimensions. The one-dimensional in Figure |8^ has y = 10 and = 10.84, 
whereas the two-dimensional h{6\y) in Figure [sj? has y = (10,10) and = 
12.57. In higher dimensions d, we used y = (10, . . . , 10) and a"^ = d ■ 10'^ /{{d + 
1) log(l + d ■ 10^)), which is a formula derived from equating the density value 
at the prior mode to the density at the likelihood mode. 

Figure 8: Posterior density h{0\y — 10) for Cauchy-normal model in (a) one and (b) two dimen- 
sions. 

(a) Posterior Density (d=1) (b) Posterior Density (d=2) 
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In higher dimensions {d > 3), it is difficult to evaluate (or sample from) the true 
posterior density h(6\y) which we need in order to have a gold standard for 
comparison between the Gibbs sampler and LSHR2 algoirithm. Fortunately, 
for this simple model, there is a rotation procedure that we detail in our sup- 
plementary materials that allows us to accurately estimate the true posterior 
density h{9\y). 
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5.1 Gibbs Sampling for Cauchy Normal Model 



The Cauchy-normal model (jSj) can be estimated via Gibbs sampling by aug- 
menting the parameter space with an extra scale parameter r^, 

y\e ~ Normal(^, (T^I) 



T^)-^ ~ Gamma ( -, - ) (4) 



^Ir^ ~ Normal(0,r2l 
'1 1 
,2' 2 

Marginalizing over gives us the same Cauchy (0, 1) prior for 6. The Gibbs 
sampler iterates between sampling from the conditional distribution of ^|r^,y: 

6\r\y ~ Normal , ^^l) (5) 



and the conditional distribution of r^|^, y: 



(r^) ^1^,1/ ~ Gamma ——, (6) 



For several different dimensions d, we ran this Gibbs sampler for one million 
iterations, with the first 500000 iterations discarded at bum-tn. In Figure |9| 
we compare the posterior samples for the first dimension Q\ from our Gibbs 
sampler to the true posterior distribution for d = 1, 2, 10 and 20. 

We see that the Gibbs sampler performs poorly at exploring the full posterior 
space in dimensions greater than one. Even in two dimensions, the Gibbs sam- 
pler struggles to fully explore both modes of the posterior distribution and this 
problem is exacerbated in dimensions higher than two. 
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Figure 9: Posterior samples of the first dimension 9i of 6 from Gibbs sampler for Cauchy- 
normal model in different dimensions d. Red curve in each plot represents the true posterior 
density. 
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5.2 LSHR2 Level Set Sampling for Cauchy-Normal Model 

Our LSHR2 level set sampling methodology was implemented on the same 
Cauchy-normal model. We start from the prior mode and move outwards in 
convex level sets, while running an exponentially-tilted hit-and-run algorithm 
within each level set in order to get samples from the posterior distribution. 
As outlined in Section |2.4[ each convex level set is defined by a threshold on 
the density that is slowly decreased in order to assure that each level set has a 
warm start. 



Figure 10 shows the number of thresholds needed to fully explore the posterior 
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distribution of the Cauchy-normal model for different number of dimensions. 
Even for high dimensions {d = 10 and d = 20), we see that the posterior density 
is explored with a reasonable number of thresholds, and the approximately 
linear trend suggests that our algorithm would scale well to even higher di- 
mensions. 

Figure 10: Number of thresholds needed for our level-set hit-and-run sampler on the posterior 
density of the Cauchy-normal model as function of dimension. 
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In Figure 11 we compare the posterior samples for the first dimension 6i from 
our LSHR2 sampler to the true posterior distribution for d = 1,2, 10 and 20. 
Our LSHR2 level set sampler (with exponential tilting) samples closely match 
the true posterior density in both low and high-dimensional cases. Comparing 
Figures |9] and Figures 11 clearly suggests that our level set hit-and-run method- 
ology gives more accurate samples from the Cauchy-normal model in dimen- 
sions higher than one. 
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Figure 11: Posterior samples of the first dimension 9i of from LSHR2 sampler for Cauchy- 
normal model in different dimensions d. Red curve in each plot represents the true posterior 
density. 
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6 Discussion 

In this paper, we have developed a general sampling strategy based on divid- 
ing a density into a series of level sets, and running the hit-and-run algorithm 
within each level set. Our basic level-set hit-and-run sampler (LSHRl) can be 
applied to the broad class of pseudo-concave densities, which includes most 
distributions used in applied statistical modeling. We illustrate our LSHRl 
sampler on spike-and-slab density (Section |3]), where our procedure performs 
much better in high dimensions than the usual Gibbs sampler. 
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We also extend our sampling methodology to an exponentially-tilted level-set 
hit-and-run sampler (LSHR2). Our LSHR2 algorithm can be applied to log- 
concave densities, which we illustrate with a simple multivariate normal ex- 
ample in Section |4| In this setting, our LSHR2 algorithm is more effective than 
Gibbs sampling when there is high dependence between parameters. 

Our LSHR2 can also be applied to exponentially-tilted quasi-concave densi- 
ties, which arise frequently in Bayesian models as the posterior distribution 
formed from the combination of a log-concave likelihood and quasi-concave 
prior density We illustrate our LSHR2 sampler on the posterior density from 
a Cauchy normal model (Section |5]), which can exhibit multi-modality under 
certain parameter settings. Our procedure is able to sample effectively from 
this multi-modal posterior density even in high dimensions, where the Gibbs 
sampler performs poorly. 
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A The LSHRl Algorithm 



In this appendix, we give pseudo-code for our LSHRl level-set hit-and-run 
sampler described in Section 2.2 We must pre-specify a minimum value K of 
the probability density /(■) as our stopping threshold. The first threshold ti 
must be pre-specified, and we arbitrarily use ti = 0.95 * /(aCmax)- Finally, the 
number of hit-and-run iterations per level set, m, must be chosen. In our spike- 
and-slab example, we set the number of hit-and-run iterations per level set to 
m = 1000 based on the evaluation presented in our supplementary materials. 



Algorithm 1 LSHRl: Level-set Hit-and-run Sampler 



1: initialize x = Xmax and Si = I 

2: initialize ti and set k = 1 

3: define current level set Ci as {x : f{x) > ti} 

4: for m iterations do 

5: sample random direction d 

6: calculate boundaries o and b of Ci along direction d 
7: sample new x uniformly between a and b 
8: store all samples x in {x}i 

9: while tk > K do 
10: propose new tprop < 

11: define proposed level set Cprop as {x : f{x) > tprop} 
12: for m iterations do 

1/2 

13: sample random direction d* and set d = d* 

14: calculate boundaries a and b of Ci along direction d 

15: sample new x uniformly between a and b 

16: if f{x) > tk then^fc.prop = Rk.pvop + 1 

17. -f^fciprop -^fc:prop/'^ 

18: if 0.55 < ^i:prop < 0.8 then 
19: k = k + l 

20: tk = iprop/ Cjs = Cprop/ Rk:k+1 — -?^fc:prop/ ^fc+l = Cov(x) 

21: store all samples x in {x}k 

22: define n = number of level sets 

23: for i = 1, . . . ,n do 

24: Qi = -ti)x HLi ^k:k+i 

25: where to is the maximum of /(•) and Rn-n+i = 1 

26: forZ = l,...,Ldo 

27: Sample level set s wp. Ps = Qs/ J2Qs 

28: Sample point Xi randomly from level set collection {x}s 



When considering the proposal for level set k + 1, the initial tprop is set to tk — 
{tk-i — tk), so that our proposal moves the same distance in t as the previous 
successful move. If this initial tprop is rejected for not being warm enough, we 
choose a new proposal tprop as the midpoint of the old proposal and tk, and so 
on. 
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B The LSHR2 Algorithm 



In this appendix, we give pseudo-code for LSHR2, our exponentially-weighted 
level-set hit-and-run sampler procedure described in Section 2.4 We must pre- 
specify a minimum value K of the probability density /(■) as our stopping 
threshold. The first threshold ti must be pre-specified, and we arbitrarily use 
ti = 0.95 * /(^max) where ^max is the mode of f{6). We set the number of hit- 
and-run iterations per level set to m = 1000. 



Algorithm 2 LSHR2: Exponentially-tilted Level-set Hit-and-run Sampler 



1: initialize 6 = ^max and Si = I 

2: initialize p = log (^(^niax) and 0^ = {6,p) 

3: initialize ti and set k = 1 

4: define current level set Di as {6* : f{6) > ti , p < log g{y\6)} 
5: for m iterations do 

6: sample random direction dm{d + 1) dimensions 
7: calculate boundaries a and b of Di along direction d 

8: sample new 9* proportional to exp(p) from line segment between a and b 
9: store all samples 6* in {^*}i 

10: while tk > K do 

11: propose new tprop < 

12: define proposed level set Dprop as {6* : f{6) > tprop , P < logg{y\0)} 
13: for m iterations do 

1 /2 

14: sample random direction d* in {d + 1) dimensions and set d = S^' d* 

15: calculate boundaries a and b of Dprop along direction d 

16: sample new 6* proportional to exp(p) from line segment between a and b 

17: if new e* G Dj, then i?fc:pi.op = Rk-.prop + 1 

18. -Kfciprop -^fc:prop/^ 

19: if 0.55 < ^i:prop < 0.8 then 
20: k = k + l 

21: tk = iprop; = Dpj-op} Rk-k+i = Rk-.prop'/ ^fc+i = Cov(a;) 

22: store all samples ^* in 

23: define n = number of level sets 

24: for i = 1, . . . ,n do 

25: Qi = - ti) X nfc=j Rk:k+i 

26: where to is the maximum of /(•) and Rnm+i = 1 

27: forZ = l,...,Ldo 

28: Sample level set s wp. = Qs/Yl 

29: Sample point 6^ randomly from level set collection {6*}s 



When considering the proposal for level set k + 1, the initial tprop is set to tk — 
{tk-i — tk), so that our proposal moves the same distance in t as the previous 
successful move. If this initial tprop is rejected for not being warm enough, we 
choose a new proposal tprop as the midpoint of the old proposal and tk, and so 
on. 
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C Gibbs Switching Calculation 



For a variable in the zero component, with norm | |x| || da^, the probability of 
a switch is 

P{l = l\Ml^dal) = 





X 


\l/2al) 


cr^ '^exp(- 




\l/2aj) + 


(7/exp(-| 


x| 





'^exp(-dao/2a^) 

i 

exp (I) 



. d 

£0 



;^)'exp(i) + i 

since cto ^ (Ji. Now, as d increases, (cto/cti)'^ goes to zero, and so 
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CTl 



exp 



exp(i) + l 
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